NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Guiding long-horizon task and motion planning with vision language models.

Yang, Zhutian; Garrett, Caelan; Kumar, Nishanth; Fox, Dieter; Lozano-Perez, Tomas; Kaelbling, Leslie (June 2025, Proceedings IEEE International Conference on Robotics and Automation)

ision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted in their plans. Robot task and motion planners can generate motion trajectories that respect the geometric feasibility of actions and insert physically necessary actions, but do not scale to everyday problems that require common-sense knowledge and involve large state spaces comprised of many variables. We propose VLM-TAMP, a hierarchical planning algorithm that leverages a VLM to generate goth semantically-meaningful and horizon-reducing intermediate subgoals that guide a task and motion planner. When a subgoal or action cannot be refined, the VLM is queried again for replanning. We evaluate VLMTAMP on kitchen tasks where a robot must accomplish cooking goals that require performing 30-50 actions in sequence and interacting with up to 21 objects. VLM-TAMP substantially outperforms baselines that rigidly and independently execute VLM-generated action sequences, both in terms of success rates (50 to 100% versus 0%) and average task completion percentage (72 to 100% versus 15 to 45%).
more » « less
Full Text Available
Differentiable GPU-Parallelized Task and Motion Planning

Shen, William; Garrett, Caelan; Kumar, Nishanth; Goyal, Ankit; Hermans, Tucker; Lozano-Perez, Tomas; Ramos, Fabio (June 2025, Robotics science and systems)

Planning long-horizon robot manipulation requires making discrete decisions about which objects to interact with and continuous decisions about how to interact with them. A robot planner must select grasps, placements, and motions that are feasible and safe. This class of problems falls under Task and Motion Planning (TAMP) and poses significant computational challenges in terms of algorithm runtime and solution quality, particularly when the solution space is highly constrained. To address these challenges, we propose a new bilevel TAMP algorithm that leverages GPU parallelism to efficiently explore thousands of candidate continuous solutions simultaneously. Our approach uses GPU parallelism to sample an initial batch of solution seeds for a plan skeleton and to apply differentiable optimization on this batch to satisfy plan constraints and minimize solution cost with respect to soft objectives. We demonstrate that our algorithm can effectively solve highly constrained problems with non-convex constraints in just seconds, substantially outperforming serial TAMP approaches, and validate our approach on multiple realworld robots.
more » « less
Full Text Available
Trust the proc3s: Solving long-horizon robotics problems with llms and constraint satisfaction.

Curtis, Aidan; Kumar, Nishanth; Cao, Jing; Lozano-Perez, Tomas; Kaelbling, Leslie (November 2024, Conference on Robot Learning)

Recent developments in pretrained large language models (LLMs) ap- plied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we ex- amine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and phys- ical constraints. We prompt the LLM to output code for a function with open parameters, which, together with environmental constraints, can be viewed as a Continuous Constraint Satisfaction Problem (CCSP). This CCSP can be solved through sampling or optimization to find a skill sequence and continuous param- eter settings that achieve the goal while avoiding constraint violations. Addition- ally, we consider cases where the LLM proposes unsatisfiable CCSPs, such as those that are kinematically infeasible, dynamically unstable, or lead to colli- sions, and re-prompt the LLM to form a new CCSP accordingly. Experiments across simulated and real-world domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints much more efficiently and effectively than existing baselines.
more » « less
Full Text Available
Adaptive Language-Guided Abstraction from Contrastive Explanations

Peng, Andi; Li, Belinda; Sucholutsky, Ilia; Kumar, Nishanth; Shah, Julie; Andreas, Jacob; Bobu, Andreea (November 2024, Conference on Robot Learning)

Full Text Available
Practice Makes Perfect: Planning to Learn Skill Parameter Policies

Kumar, Nishanth; Silver, Tom; McClinton, Willie; Zhao, Linfeng; Proulx, Stephen; Lozano-Perez, Tomas; Kaelbling, Leslie; Barry, Jennifer (July 2024, Robotics: Science and Systems Proceedings 2024)

One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: “how much would the competence improve through practice?”), and situate the skill in the task distribution through competence- aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective pa- rameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach’s ability to handle noise from perception and control and improve the robot’s ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu
more » « less
Full Text Available
Learning Efficient Abstract Planning Models that Choose What to Predict

Kumar, Nishanth; McClinton, Willie; Chitnis, Rohan; Silver, Tom; Lozano-Perez, Tomas; Kaelbling, Leslie (November 2023, Proceedings of Machine Learning Research: Conference on Robot Learning (CoRL) 2023)

continuous state and action spaces is bilevel planning, wherein a high- level search over an abstraction of an environment is used to guide low-level decision-making. Recent work has shown how to enable such bilevel planning by learning abstract models in the form of symbolic operators and neural sam- plers. In this work, we show that existing symbolic operator learning approaches fall short in many robotics domains where a robot’s actions tend to cause a large number of irrelevant changes in the abstract state. This is primarily because they attempt to learn operators that exactly predict all observed changes in the abstract state. To overcome this issue, we propose to learn operators that ‘choose what to predict’ by only modelling changes necessary for abstract planning to achieve specified goals. Experimentally, we show that our approach learns operators that lead to efficient planning across 10 different hybrid robotics domains, including 4 from the challenging BEHAVIOR-100 benchmark, while generalizing to novel initial states, goals, and objects.
more » « less
Full Text Available
Predicate Invention for Bilevel Planning

https://doi.org/10.1609/aaai.v37i10.26429

Silver, Tom; Chitnis, Rohan; Kumar. Nishanth; McClinton, Willie; Lozano-Perez, Tomas; Kaelbling, Leslie; Tenenbaum, Joshua (February 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Efficient planning in continuous state and action spaces is fundamentally hard, even when the transition model is deterministic and known. One way to alleviate this challenge is to perform bilevel planning with abstractions, where a highlevel search for abstract plans is used to guide planning in the original transition space. Previous work has shown that when state abstractions in the form of symbolic predicates are hand-designed, operators and samplers for bilevel planning can be learned from demonstrations. In this work, we propose an algorithm for learning predicates from demonstrations, eliminating the need for manually specified state abstractions. Our key idea is to learn predicates by optimizing a surrogate objective that is tractable but faithful to our real efficient-planning objective. We use this surrogate objective in a hill-climbing search over predicate sets drawn from a grammar. Experimentally, we show across four robotic planning environments that our learned abstractions are able to quickly solve held-out tasks, outperforming six baselines.
more » « less
Full Text Available
Multi-Object Search using Object-Oriented POMDPs

https://doi.org/10.1109/ICRA.2019.8793888

Wandzel, Arthur; Oh, Yoonseon; Fishman, Michael; Kumar, Nishanth; Wong, Lawson L.S.; Tellex, Stefanie (May 2019, IEEE International Conference on Robotics and Automation (ICRA))

Full Text Available

Search for: All records